# High-Precision Image Captioning
Pixelreasoner RL V1
Apache-2.0
PixelReasoner is a vision-language model based on Qwen2.5-VL-7B-Instruct, trained with curiosity-driven reinforcement learning, focusing on image-text-to-text tasks.
Image-to-Text
Transformers English

P
TIGER-Lab
112
3
Qwen2.5 VL 3B Instruct Quantized.w4a16
Apache-2.0
The quantized version of Qwen2.5-VL-3B-Instruct, with weights quantized to INT4 and activations quantized to FP16, designed for efficient vision-text task inference.
Text-to-Image
Transformers English

Q
RedHatAI
167
1
Asagi 4B
Apache-2.0
Asagi-4B is a large-scale Japanese Vision-Language Model (VLM) trained on extensive Japanese datasets, incorporating diverse data sources.
Image-to-Text
Transformers Japanese

A
MIL-UT
29
4
Featured Recommended AI Models